NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An automated OpenMP mutation testing framework for performance optimization

https://doi.org/10.1016/J.PARCO.2024.103097

Miao, Dolores; Laguna, Ignacio; Georgakoudis, Giorgis; Parasyris, Konstantinos; Rubio-González, Cindy (September 2024, Parallel Computing)

Full Text Available
Input Range Generation for Compiler-Induced Numerical Inconsistencies

https://doi.org/10.1145/3650200.3656618

Miao, Dolores; Laguna, Ignacio; Rubio-González, Cindy (May 2024, ACM)

Full Text Available
FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

https://doi.org/10.1109/CCGrid59990.2024.00014

Li, Xinyi; Li, Ang; Fang, Bo; Swirydowicz, Katarzyna; Laguna, Ignacio; Gopalakrishnan, Ganesh (May 2024, IEEE)

Full Text Available
MUPPET: Optimizing Performance in OpenMP via Mutation Testing

https://doi.org/10.1145/3649169.3649246

Miao, Dolores; Laguna, Ignacio; Georgakoudis, Giorgis; Parasyris, Konstantinos; Rubio-González, Cindy (March 2024, ACM)

Full Text Available
Finding inputs that trigger floating-point exceptions in heterogeneous computing via Bayesian optimization

https://doi.org/10.1016/j.parco.2023.103042

Laguna, Ignacio; Tran, Anh; Gopalakrishnan, Ganesh (September 2023, Parallel Computing)

Testing code for floating-point exceptions is crucial as exceptions can quickly propagate and produce unreliable numerical answers. The state-of-the-art to test for floating-point exceptions in heterogeneous systems is quite limited and solutions require the application’s source code, which precludes their use in accelerated libraries where the source is not publicly available. We present an approach to find inputs that trigger floating-point exceptions in black-box CPU or GPU functions, i.e., functions where the source code and information about input bounds are unavailable. Our approach is the first to use Bayesian optimization (BO) to identify such inputs and uses novel strategies to overcome the challenges that arise in applying BO to this problem. We implement our approach in the Xscope framework and demonstrate it on 58 functions from the CUDA Math Library and 81 functions from the Intel Math Library. Xscope is able to identify inputs that trigger exceptions in about 73% of the tested functions.
more » « less
Full Text Available
Expression Isolation of Compiler-Induced Numerical Inconsistencies in Heterogeneous Code

https://doi.org/10.1007/978-3-031-32041-5_20

Miao, Dolores; Laguna, Ignacio; Rubio-Gonzalez, Cindy (May 2023, High Performance Computing - 38th International Conference (ISC 2023))

Full Text Available
pLiner: Isolating Lines of Floating-Point Code for Compiler-Induced Variability

https://doi.org/10.1109/SC41405.2020.00053

Guo, Hui; Laguna, Ignacio; Rubio-Gonzalez, Cindy (November 2020, International Conference for High Performance Computing, Networking, Storage and Analysis)
null (Ed.)
Full Text Available
Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs

https://doi.org/10.1145/3588195.3592991

Li, Xinyi; Laguna, Ignacio; Fang, Bo; Swirydowicz, Katarzyna; Li, Ang; Gopalakrishnan, Ganesh (August 2023, ACM)

Not Floating-point exceptions occurring during numerical computations can be a serious threat to the validity of the computed results if they are not caught and diagnosed Unfortunately, on NVIDIA GPUs-today's most widely used types and which do not have hardware exception traps-this task must be carried out in software. Given the prevalence of closed-source kernels, efficient binary-level exception tracking is essential. It is also important to know how exceptions flow through the code, whether they alter the code behavior and additionally whether these exceptions can be detected at the program outputs or are killed inside program flow-paths. In this paper, we introduce GPU-FPX, a tool that has low overhead, allows for deep understanding of the origin and flow of exceptions, and also how exceptions are modified by code optimizations. We measure GPU-FPX's performance over 151 widely used GPU programs coming from HPC and ML, detecting 26 serious exceptions that were previously not reported. Our results show that GPU-FPX is 16× faster with respect to the geometric-mean runtime in relation to the only comparable prior tool, while also helping debug a larger class of codes more effectively.
more » « less
Full Text Available
Detecting and reproducing error-code propagation bugs in MPI implementations

https://doi.org/10.1145/3332466.3374515

DeFreez, Daniel; Bhowmick, Antara; Laguna, Ignacio; Rubio-González, Cindy (February 2020, Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming)

Full Text Available
Multi-Level Analysis of Compiler-Induced Variability and Performance Tradeoffs

https://doi.org/10.1145/3307681.3325960

Bentley, Michael; Briggs, Ian; Gopalakrishnan, Ganesh; Ahn, Dong H.; Laguna, Ignacio; Lee, Gregory L.; Jones, Holger E. (January 2019, HPDC '19 Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing)

Successful HPC software applications are long-lived. When ported across machines and their compilers, these applications often produce different numerical results, many of which are unacceptable. Such variability is also a concern while optimizing the code more aggressively to gain performance. Efficient tools that help locate the program units (files and functions) within which most of the variability occurs are badly needed, both to plan for code ports and to root-cause errors due to variability when they happen in the field. In this work, we offer an enhanced version of the open-source testing framework FLiT to serve these roles. Key new features of FLiT include a suite of bisection algorithms that help locate the root causes of variability. Another added feature allows an analysis of the tradeoffs between performance and the degree of variability. Our new contributions also include a collection of case studies. Results on the MFEM finite-element library include variability/performance tradeoffs, and the identification of a (hitherto unknown) abnormal level of result-variability even under mild compiler optimizations. Results from studying the Laghos proxy application include identifying a significantly divergent floating-point result-variability and successful root-causing down to the problematic function over as little as 14 program executions. Finally, in an evaluation of 4,376 controlled injections of floating-point perturbations on the LULESH proxy application, we showed that the FLiT framework has 100% precision and recall in discovering the file and function locations of the injections all within an average of only 15 program executions.
more » « less
Full Text Available

« Prev Next »

Search for: All records